On the optimality of averaging in distributed statistical learning

نویسندگان

  • Jonathan D. Rosenblatt
  • Boaz Nadler
  • B. NADLER
چکیده

A common approach to statistical learning with Big-data is to randomly split it among m machines and learn the parameter of interest by averaging the m individual estimates. In this paper, focusing on empirical risk minimization or equivalently M-estimation, we study the statistical error incurred by this strategy. We consider two large-sample settings: first, a classical setting where the number of parameters p is fixed, and the number of samples per machine n →∞. Second, a high-dimensional regime where both p, n →∞ with p/n → κ ∈ (0, 1). For both regimes and under suitable assumptions, we present asymptotically exact expressions for this estimation error. In the fixed-p setting, we prove that to leading order averaging is as accurate as the centralized solution. We also derive the second-order error terms, and show that these can be non-negligible, notably for nonlinear models. The high-dimensional setting, in contrast, exhibits a qualitatively different behavior: data splitting incurs a first-order accuracy loss, which increases linearly with the number of machines. The dependence of our error approximations on the number of machines traces an interesting accuracy-complexity tradeoff, allowing the practitioner an informed choice on the number of machines to deploy. Finally, we confirm our theoretical analysis with several simulations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Low-Area/Low-Power CMOS Op-Amps Design Based on Total Optimality Index Using Reinforcement Learning Approach

This paper presents the application of reinforcement learning in automatic analog IC design. In this work, the Multi-Objective approach by Learning Automata is evaluated for accommodating required functionalities and performance specifications considering optimal minimizing of MOSFETs area and power consumption for two famous CMOS op-amps. The results show the ability of the proposed method to ...

متن کامل

Optimal Simple Step-Stress Plan for Type-I Censored Data from Geometric Distribution

Abstract. A simple step-stress accelerated life testing plan is considered when the failure times in each level of stress are geometrically distributed under Type-I censoring. The problem of choosing the optimal plan is investigated using the asymptotic variance-optimality as well as determinant-optimality and probability-optimality criteria. To illustrate the results of the paper, an example i...

متن کامل

Computational Limits of A Distributed Algorithm for Smoothing Spline

In this paper, we explore statistical versus computational trade-off to address a basic question in the application of a distributed algorithm: what is the minimal computational cost in obtaining statistical optimality? In smoothing spline setup, we observe a phase transition phenomenon for the number of deployed machines that ends up being a simple proxy for computing cost. Specifically, a sha...

متن کامل

Dynamic Obstacle Avoidance by Distributed Algorithm based on Reinforcement Learning (RESEARCH NOTE)

In this paper we focus on the application of reinforcement learning to obstacle avoidance in dynamic Environments in wireless sensor networks. A distributed algorithm based on reinforcement learning is developed for sensor networks to guide mobile robot through the dynamic obstacles. The sensor network models the danger of the area under coverage as obstacles, and has the property of adoption o...

متن کامل

Multiple Optimality Guarantees in Statistical Learning

Multiple Optimality Guarantees in Statistical Learning by John C Duchi Doctor of Philosophy in Computer Science and the Designated Emphasis in Communication, Computation, and Statistics University of California, Berkeley Professor Michael I. Jordan, Co-chair Professor Martin J. Wainwright, Co-chair Classically, the performance of estimators in statistical learning problems is measured in terms ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014